Estimating Mutual Information in Under-Reported Variables

نویسندگان

  • Konstantinos Sechidis
  • Matthew Sperrin
  • Emily Petherick
  • Gavin Brown
چکیده

Under-reporting occurs in survey data when there is a reason to systematically misreport the response to a question. For example, in studies dealing with low birth weight infants, the smoking habits of the mother are very likely to be misreported. This creates problems for calculating effect sizes, such as bias, but these problems are commonly ignored due to lack of generally accepted solutions. We reinterpret this as a problem of learning from missing data, and particularly learning from positive and unlabelled data. By this formalisation we provide a simple method to incorporate prior knowledge of the misreporting and we present how we can use this knowledge to derive corrected point and interval estimates of the mutual information. Then we show how our corrected estimators outperform more complex approaches and we present applications of our theoretical results in real world problems and machine learning tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating Mutual Information for Discrete-Continuous Mixtures

Estimating mutual information from observed samples is a basic primitive, useful in several machine learning tasks including correlation mining, information bottleneck clustering, learning a Chow-Liu tree, and conditional independence testing in (causal) graphical models. While mutual information is a well-defined quantity in general probability spaces, existing estimators can only handle two s...

متن کامل

On Classification of Bivariate Distributions Based on Mutual Information

Among all measures of independence between random variables, mutual information is the only one that is based on information theory. Mutual information takes into account of all kinds of dependencies between variables, i.e., both the linear and non-linear dependencies. In this paper we have classified some well-known bivariate distributions into two classes of distributions based on their mutua...

متن کامل

Estimating Mutual Information by Local Gaussian Approximation

Estimating Mutual Information by Local Gaussian Approximation Report Title Estimating mutual information (MI) from samples is a fundamental problem in statistics, machine learning, and data analysis. Recently it was shown that a popular class of non-parametric MI estimators perform very poorly for strongly dependent variables and have sample complexity that scales exponentially with the true MI...

متن کامل

Speeding up Feature Selection by Using an Information Theoretic Bound

The paper proposes a technique for speeding up the search of the optimal set of features in classification problems where the input variables are discrete or nominal. The approach is based on the definition of an upper bound on the mutual information between the target and a set of d input variables. This bound is derived as a function of the mutual information of its subsets of d − 1 cardinali...

متن کامل

The Informativeness of Reported Earnings and Characteristics of the Audit Committee

An information usefulness approach to decision making points out that only the information is regarded as useful that will bring valuable messages to investors and lead to stock price adjustments. This study examines the effectiveness of audit committees in improving earnings quality and informativeness, particularly among family-owned firms. Earnings informativeness was measured through the re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016